Systolic equalizer and method of using same

ABSTRACT

A method and apparatus provide a systolic equalizer for Viterbi equalization of an 8-PSK signal distorded by passage through a communication channel. The systolic equalizer architecture is scalable to process, as examples, four, eight and 16 state received signals. An equalizer ( 10 ) in accordance with this invention includes a logical arrangement of a plurality of instantiations of locally coupled processing elements ( 12 ) forming a systolic array for processing in common received signal samples having distortion induced by passage through a communication channel. The equalizer outputs soft values for input to a decoder, the soft values representing an approximation of maximum a posteriori (MAP) probabilities. A trellis search procedure is employed to reconstruct estimates of a received signal sequence based on a reduced number of states. The reduced numbe of states is represented by a plurality of groups determined by partioning a symbol constellation such that there are fewer groups than possible symbols.

FIELD OF THE INVENTION

[0001] This invention relates generally to electronic circuits known asequalizers. More particularly, this invention relates to a Very LargeScale Integration (VLSI) architecture for implementation of theequalizer for communication systems, where the equalizer comprises aplurality of like processing elements (PEs) that are coupled together ina systolic-type of architecture.

BACKGROUND OF TIRE INVENTION

[0002] A communications system equalizer is a circuit used in a receiverto compensate a received signal for losses and distortion experienced ina communication path between a transmitter of the signal and thereceiver. In RF communication systems, such as cellular telephonesystems, conventional practice could construct the equalizer circuitusing discrete components or, more recently, using a suitably programmeddigital signal processor (DSP). In this approach the DSP is normally notdedicated to performing only the equalizer function, but more typicallyis responsible for the execution of a number of other signal processingtasks as well. As a result, as data rates continue to increase it hasbeen found that the DSP capacity, and especially the lack of availableDSP capacity, has created a problem. The increase in data rates alsoincreases the equalizer algorithm complexity, and thus requires higherDSP processing performance. Simply using a faster and higher powered DSPalso creates problems, as this approach requires a significant number ofskilled engineers, and a large amount of time, resources and risk, tomigrate the existing DSP-executed software applications to a new DSPplatform. In addition, faster DSPs generally consume more current, whichcan be a significant disadvantage in battery powered devices such ascellular telephones and personal communicators. This situation hascreated a need to transfer the DSP equalizer software functionality tohardware.

[0003] An Application Specific Integrated Circuit (ASIC) hardwareimplementation provides more processing power and is more area efficientthan a DSP solution. Thus, there is a need for an equalizer implementedin hardware that is power and area efficient. There is also a need for ascalable equalizer. However, ASIC technology does not provide a quickdesign implementation, and is limited in its ability to be changed toaccommodate revisions to a design or specification.

[0004] Trellis searching architectures have been studied for GSM (GlobalSystem for Mobile Communications) systems using serial (centralized) andparallel (distributed) approaches, as evidenced by A. Lloyd, M. Reynoldsand Y. Shah, “VLSI Architectures for Viterbi Decoding,” IEE Colloquiumon VLSI Implementations for Second Generation Digital Cordless andMobile Telecommunication Systems, 1990, pp. 6/1-6/7, hereafter referredto as Lloyd and incorporated by reference herein in its entirety. Thescalability of a systolic approach, as discussed by P. Gulak and T.Kailath, “Locally Connected VLSI Architectures for the ViterbiAlgorithm,” IEEE Journal on Selected Areas in Communications, Vol. 6,No. 3, April 1988, pp. 527-537, hereafter referred to as Gulak andincorporated by reference herein in its entirety, has lead to a singletype of processing element (PE) which can be used as the basis foreither a serial or parallel approach to Viterbi decoding. The PE isamenable to the pipelining of computational elements, see P. Pirsch,“Architectures for Digital Signal Processing,” John Wiley, New York,1996, hereafter referred to as Pirsch and incorporated by referenceherein in its entirety, which allows multiple operations to occur oneach clock edge.

[0005] In Lloyd a locally connected array is shown (FIG. 3), and theseauthors state that their VLSI architecture is applicable to both Viterbidecoding and Viterbi equalization.

[0006] Also of interest is publication by Chakraborty, Mrityunjoy andSuraiya Pervin, “A systolic array realization of the adaptive decisionfeedback equalizer”, Signal Processing, Vol. 80 (2000), No.: 12, pp.2633-2640.

[0007] A need exists, as yet unfulfilled prior to this invention, toprovide a scalable channel equalizer that is constructed and operated asa parallel, systolic array of like processing elements that exhibits,among other features, reduced state sequence estimation, decisionfeedback and a global search function for metric normalization and softvalue determination in a serial parallel processor structure.

SUMMARY OF THE INVENTION

[0008] An embodiment of this invention provides a method that equalizesa phase modulated signal, e.g., an 8-PSK (Phase Shift Keying) signal,that is distorted by multipath during passage through a communicationchannel. Another embodiment of this invention provides apparatus,embodied as circuitry, that equalizes a phase modulated signal, e.g., an8-PSK signal, that is distorted by passage through a communicationchannel. The preferred embodiment provides a systolic equalizer forrealizing the Viterbi equalization of an 8-PSK signal.

[0009] A presently preferred embodiment of the systolic equalizer ofthis invention employs one Processing Element (PE) for each state of Nstates of the equalizer function. In the preferred embodiment each PEhas a local memory (LM) and an associated Record (R). In the presentlypreferred embodiment the equalizer processes one burst of samples of areceived signal at a time, and a channel estimator provides to theequalizer an estimate of the channel, convolved with a pre-filterresponse. The equalizer constructs a look up table (LUT) of the productof individual channel taps with individual constellation points of the8-PSK constellation. An initialization function initializes the NRecords and the N local memories, and a first sample is input to all ofthe PE's in parallel. Each PE then processes a Record and passes theprocessed Record to a neighboring PE. This occurs N times per sample sothat each Record visits every PE. The Records and Local Memories aremodified as required by each PE. After all PEs have processed all of theRecords, a soft output calculation unit obtains data from a terminal(e.g., a rightmost) PE of the array of PEs and produces soft output fora previous symbol. The process can then be repeated for the next sampleuntil the entire burst of samples is processed. Other embodiments of asystolic array equalizer are also provided.

[0010] An equalizer in accordance with this invention includes a logicalarrangement of a plurality of instantiations of locally coupledprocessing elements that form a systolic array for processing in commonreceived signal samples having distortion induced by passage through acommunications channel. The equalizer outputs soft values for input to adecoder, the soft values representing an approximation of maximum aposteriori (MAP) probabilities. A trellis search procedure is employedto reconstruct estimates of a received signal sequence based on areduced number of states. The reduced number of states is represented bya plurality of groups determined by partitioning a symbol constellationsuch that there are fewer groups than possible symbols.

BRIEF DESCRIPTION OF THE DRAWINGS

[0011] The foregoing and other features, aspects, and advantages ofembodiments of this invention are made more apparent in the followingdescription of presently preferred embodiments of the invention, whenread in conjunction with the accompanying drawings. It is to beunderstood, however, that the drawings are provided solely for thepurposes of illustration, and are not to be viewed as a definition ofthe limits of the invention.

[0012]FIG. 1 is a diagram showing partitioning of an 8-PSK constellationusing J=4, where in the trellis state, instead of a symbol being one ofeight, it becomes a symbol from one of four groups (reduced states),where one of the symbols of the group is ignored as being very unlikely.

[0013]FIGS. 2A, 2B and 2C, collectively referred to as FIG. 2, are ablock diagram of a single PE and associated components, a block diagramof a linear array of PEs, and a block diagram illustrative of theSystolic Equalizer Architecture in accordance with an embodiment of theinvention, where there is one PE per state.

[0014]FIG. 3A is a block diagram illustrative of PE structure inaccordance with an embodiment of the invention, while FIG. 3B shows thecritical processing path of the PE of FIG. 3A.

[0015]FIGS. 4A and 4B are block diagrams illustrative of the LocalCompares block of FIG. 3A.

[0016]FIGS. 5A and 5B are block diagrams illustrative of the GlobalCompares block of FIG. 3A.

[0017]FIG. 6 is a representation of the systolic architecture for N=16.

[0018]FIG. 7 is a diagram illustrative of a mapping of a four statearchitecture to one PE in accordance with an embodiment of the presentinvention.

[0019]FIG. 8 is a diagram showing how one PE can implement the fourstate equalizer.

[0020]FIG. 9A shows an example of a trellis for a 16-state equalizer,that is not fully connected although is nicely grouped, while FIG. 9Bshows a block diagram of the 16 state systolic equalizer, as also shownin FIG. 6.

[0021]FIG. 10 illustrates a receiver that includes an equalizer that isconstructed and operated in accordance with this invention.

[0022]FIGS. 11A, 11B, 11C and 11D, collectively referred to as FIG. 11,are useful in explaining the operation of the equalizer algorithm, whereFIG. 11A shows the overall algorithm flow and a definition of thenotation, FIG. 11B shows a set of presently preferred equalizerequations, FIG. 11C shows a four state trellis and the maximization ofthe cumulative metric, and FIG. 11D shows the update procedure for thePE Local Memory and Channel LUT.

[0023]FIG. 12 is a logic flow diagram in accordance with a method ofthis invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

[0024] As employed herein a systolic processor is one that “pumps” ortransfers data from one processing element to another. Systolicprocessor arrays, per se, are well known in the art. Systolic processorarrays have been used to increase pipelined computing capability, andtherefore the computing speed, of various types of signal processors.Systolic processor arrays have been used for matrix multiplication, aswell as for handwriting recognition and image processing tasks, amongothers. The systolic array may have the form of a two-dimensional lineararray, or a three-dimensional area array.

[0025] An embodiment of the present invention defines a method and anapparatus that provide for the detection of an 8-PSK signal distorted bypassage through a communication channel. A preferred, but not limiting,embodiment provides a systolic equalizer for Viterbi equalization of an8-PSK signal for EDGE (Enhanced Data rate for Global Evolution) RFcommunication devices (an evolution of the GSM time division, multipleaccess (TDMA) mobile communications system). In the EDGE system thechannel coder is convolutional, and uses multiple data rates andpuncturing patterns. RMS delay spreads on the order of a symbol periodare common, and equalization is required to meet the minimum performancerequirements of the EDGE standard. Some applicable equalizationalgorithms are discussed by W. Gerstacker and R. Schober, “Equalizationfor EDGE mobile communications”, IEE Electronics Letters, Jan. 2, 2000,Vol. 36, No. 2, pp.189-191, incorporated by reference herein in itsentirety.

[0026] Referring first to FIG. 10, there is illustrated a simplifiedblock diagram of a receiver 1, which in the preferred but not limitingembodiment is an EDGE receiver, that includes an equalizer 10 that isconstructed and operated in accordance with this invention. An input RFsignal is first converted to a digital signal by analog-to-digitalconversion (ADC) to form the input signal to a receiver filter 2. Thefiltered signal z is then applied to a channel estimator 3, as well asto a prefilter 4. The channel estimator 3 forms an estimate of the RFcommunications channel h_(o) using a midamble training sequence of theEDGE burst, and a prefilter function (f) is calculated. The prefilteredsignal samples y, where y equals z convolved with f, are passed to theequalizer 10 along with the impulse response (overall channel estimate)h, where h equals h_(o) convolved with f. The signal y is equalized inequalizer block 10, and the resulting soft values, soft, are input to achannel decoder 5, such as a Viterbi decoder.

[0027] The theory of operation of the equalizer 10 of this invention issummarized as follows.

[0028] Reconfigurable hardware maybe used to create the equalizer 10 inaccordance with the present invention. Locally connected processingelements (PE)s enable small circuit area realizations, and the preferredsystolic architecture of the equalizer 10 is applicable for searching atrellis using the Viterbi algorithm. Realizations of 2, 4 and 16 statereduced state soft output equalizers have been evaluated by theinventors. Evaluations of the central processing tasks show that thesetasks may be implemented with, by example, 30-200 kgates (thousandgates) operated at 25-35 MHz, depending on the number of states in theequalizer.

[0029] The following discussion explains the algorithm which thesystolic equalizer architecture implements. It is not intended to beread as a derivation of the algorithm.

[0030] The formulation given in Lloyd for the Viterbi algorithm isrepeated here.

[0031] For each sample received (y) at time (τ)

[0032] For each state (k)

[0033] For each state (j) that can lead to this state

[0034] j′=argmin_(j)(C_(j)+D_(j,k)(y))

[0035] C_(k)=C_(j′)+D_(j′,k)(y)

[0036] For all τ′<τ

[0037]  P_(k,t′)=P_(j′t′)

[0038] P_(k,t)=T_(j′,k)

[0039] The variables used here are defined as follows: C is thecumulative metric; D is the branch metric; and P is the symbol history.For the equalizer problem D=∥y−r∥², where y is the received sample and ris a reference sample calculated based on path history and thetransition to the current state being evaluated.

[0040] A basic function of the Viterbi algorithm is to determine whatsequence of symbols has been received. One additional determination thatis needed for equalization and decoding, such as turbo decoding, is toassociate a probability value with each received element in thesequence. This is referred to in the art as soft output, which is anapproximation of the maximum a posterior (MAP) probability of the bit inquestion (see H. Van Trees, “Detection, Estimation, and ModulationTheory,” John Wiley, New York, 1968 for a discussion of exact MAPprobabilities, and see Koch and Baier for approximate MAPprobabilities.)

[0041] A most straightforward way to determine the received sequence isto test for correlation of the received data with all possible receivedsequences. However, for a GSM 8-PSK EDGE burst of duration 148 symbols,there are too many possible sequences to test for in a typical real-timecommunications application. However, since the sequences all containidentical sub-sequences, the searching task can be reduced significantlyby taking an initial sub-sequence and extending it by one sample. Theresulting sequences are then extended by one sample. For each extension,a distance measure (C, above) is updated to show how close the candidatesequence is to the received data. When two extensions make use of thesame subsequence of length equal to the channel memory, only onesequence is retained (associated with j′ above); the others arediscarded. This is denoted as Maximum Likelihood Sequence Estimation(MLSE), as described by G. Ungerboeck, “Adaptive Maximum-LikelihoodReceiver for Carrier-Modulated Data-Transmission Systems”, IEEE Trans.on Comm., Vol. 22, No.5, May 1974, pp. 624-636, incorporated byreference herein in its entirety. In this manner the number of candidatesequences is kept manageable. In order to administer these extensionsand comparisons, the state (j, above) of the channel memory is defined.Any candidate sequence passing through a given state represents ahypothesis that the channel memory was composed of the symbols listed inthe state description at that sample time (τ, above).

[0042] A convenient way to represent the states is to identify them aspermutations of the possible transmitted symbol sequences. The length ofthe permutation is equal to the channel memory length less one, L−1. Forexample, for 8-PSK (M=8) and L=3, the state label would be representedby 2*3=6 bits, and the resulting number of states is 64. As an example,assume that the symbol 010₂ were sent, followed by symbol 110₂, and thenfinally symbol b₂b₁b₀. Then state 010110₂ should exhibit a very highmetric in the search process in extending sequences using possiblevalues of b₂b₁b₀.

[0043] A commonly used variation of MLSE is to only search oversequences of a limited length. The intersymbol interference (ISI) energywhich remains is accounted for by canceling it using per survivor DFE(decision feedback equalization). The full channel length is L_(h),while the MLSE length is denoted L. The number of DFE channel taps isthen L_(h)−L. This arrangement is sometimes referred to as DecisionFeedback Sequence Estimation (DFSE).

[0044] Note that the filter taps used for reconstruction can be fixed atthe beginning of a time slot, and need not be adaptive during the timeslot. This is but one distinction between the teachings of thisinvention and the systolic decision feedback equalizer (DFE) describedin the above-referenced Chakraborty et al. publication: “A systolicarray realization of the adaptive decision feedback equalizer”, SignalProcessing, Vol. 80 (2000), No.: 12, pp. 2633-2640. Another distinctionrelates to the fact that the DFE need not employ a trellis search.

[0045] It is useful to reduce the number of states that are implemented.This can be done by labeling the states as permutations of groups (g)that the symbols could have been chosen from, as in FIG. 1. These groupsare determined by partitioning the symbol constellation (see, forexample, M. Eyuboglu and S. Qureshi, “Reduced-State Sequence Estimationwith Set Partitioning and Decision Feedback”, IEEE Trans. on Comm., Vol.36, No. 1, January 1988, pp13-20, incorporated by reference herein inits entirety). Because there are fewer groups than symbols, the numberof permutations and states is thereby reduced. For example, if L=3 andeach group contains two symbols (J₁=J₂=4 using Qureshi notation), sothat there are 4 groups, then the number of states is 4^(L−1)=16, asopposed to 64. The performance loss is minimized by maximizing thedistance within the constellation between symbols in the same group,making it unlikely that a received sample will be associated with anincorrect element of a group rather than the correct element. The numberof symbols has not been reduced, however, so it is still necessary toextend every sequence using every one of the M symbols in theconstellation.

[0046] It is also desirable to obtain soft output corresponding to everycoded bit sent by the channel coder of the transmitter. This is done bytaking a difference of cumulative metrics, which is related to thelikelihood ratio of the bit of interest being a one versus a zero(reference maybe had in this regard to W. Koch and A. Baier, “Optimumand Sub-Optimum Detection of Coded Data Disturbed by Time-VaryingIntersymbol Interference,” IEEE Globecom 1990, pp. 1679-1684,incorporated by reference herein in its entirety). For the presentlypreferred embodiment of the equalizer 10 it is desirable that theequalizer 10 search for the best cumulative metric associated with atransmitted one versus the best metric for a transmitted zero, take thedifference, and then present the difference as the soft output soft. ForM=8, there are three coded bits per received sample y and thus threesoft values. These soft values are calculated after all of the energycorresponding to a given symbol has entered the equalizer 10. Thisresults in producing the soft values soft associated with symbols_(k−(L−1)), after receiving symbol s_(k). Because of the operation ofreducing the number of states, the possibility may arise that there areno cumulative metrics observed by the equalizer 10 with an associatedpath history of a ‘1’, for example, at history position k−(L−1), bit i(for M=8, there are 3 bits at each symbol position, i=0, 1 or 2). Inthis case, the soft value is substituted. This substitution preferablyreflects the probability of the bit in question.

[0047] The equations above can be extended as follows in order to obtainsoft values: For each sample (y) received   For each state (k)       Foreach state (j) and transition (m) that can lead to this       state        For each bit position (n) in the symbol             b =history_(j)<<(log₂M)*(L−1)+n             C_0(n) = max_(b=0) C(j,m)            C_1(n) = max_(b=1) C(j,m)     For each bit position (n) inthe symbol       if (soft(τ −_τ_(m),n) exists)           soft(τ−_τ_(m),n) = C_0(τ,n)−C_1(τ,n)       else           soft(τ −_τ_(m),n) =substitute(τ −_τ_(m),n).

[0048] The original descriptive equations are sufficiently general toinclude the possibility of reduced states with the addition of anadditional internal loop. The additional internal loop would read “Foreach transition (m) which can lead to this state from state (j).”

[0049] The approximation of the MAP soft values may be developed using,for example, the approach of W. Koch and A. Baier, “Optimum andSub-Optimum Detection of Coded Data Disturbed by Time-VaryingIntersymbol Interference,” IEEE Globecom 1990, pp. 1679-1684. InEquation 15 of Koch and Baier it is pointed out that:−ln(e^(x)+e^(y))=min(x,y)−ln(1+e^(|y−x|)), so that the log of the summay be approximated by using a minimum operation. When a departure fromexactness is made, many alternatives are possible. In the presentlypreferred embodiment of this invention the natural log (ln) andexponential (exp) functions have been eliminated in favor of using theabove-shown max( ) operations, i.e., C_(—)0(n)=max_(b=0) C(j,m), andC_(—)1(n)=max_(b=1) C(j,m). Note in this regard that max( ) and min( )are the same operation with the exception of the sign of the value beingtested.

[0050] For the purposes of this invention, in a presently preferredembodiment the soft assignments:(soft(τ−_Σ_(n),n)=C_(—)0(τ,n)−C_(—)1(τ,n),soft(τ−_τ_(n),n)=substitute(τ−_τ_(n),n)) are considered to be or torepresent an approximation of the maximum a posteriori (MAP)probabilities.

[0051] At this point it will also be useful to provide a shortdiscussion related to the existence of cumulative values for the softmetric calculation. The soft values are calculated as the differencebetween two cumulative values. Those two cumulative values areapproximations of the probability of a “1” and the probability of a “0”(see, for example, Equation 14 of Koch and Baier). In a reduced statesequence estimator (RSSE), one of these probabilities may not have beenstored by the algorithm, i.e., no value exists for the algorithm to use.This situation can occur when one of these event probabilities (forexample, the event “0”) is so low that all of the surviving paths onlycontain information about the other event (in this example, the event“1”). This lack of information is not important to performance becauseit only occurs in unambiguous cases. However, one must still supply thedecoder with an appropriate soft value (in this example, the soft valuewould strongly favor a “1” in the decoding process). There are a numberof ways to substitute an appropriate soft value. One technique isreferred to as DFsoft, and is explained in commonly assigned U.S. patentapplication Ser. No. 09/928,927, filed Aug. 13, 2001, “Soft BitComputation for Reduced State Equalizers”, Andrei Malkov, Heikki Berg,Pekka Kaasila, Kiran Kuchi and Jan Olivier, incorporated by referenceherein in its entirety. In the DFsoft approach, the reference sample, r,of FIG. 3A is computed for each of the constellation points, x (seeFIGS. 1 and 11A). The intersymbol interference due to earlier symbols isapproximated by using the path history with the largest metric. In an8-PSK embodiment, each constellation point x has a one-to-one mapping toa 3-bit sequence. For the bit in question, the Euclidean distancebetween the nearest symbol with a “0” in that position is subtractedfrom the Euclidean distance of the closest x with a “1” in that bitposition. This value is substituted for the missing soft value. In theexample at hand, the symbol with a “0” in that position would be quitedistant, and a metric favoring “1” strongly would be created.

[0052] The following description is intended to aid the reader invisualizing how the algorithm is partitioned over various computationalelements. This is an important step in the hardware design, as itidentifies the required computational units and how the data movesbetween the computational units. The pseudocode provided belowillustrates in detail the algorithmic steps.

[0053] In the presently preferred embodiment of this invention theequalizer 10 is embodied in an integrated circuit, more specifically inan ASIC. As such, a number of practical considerations arise related togate count, silicon area, clock rate, power consumption and flexibility(among others). The presently preferred embodiment is influenced atleast in part by the VLSI Viterbi algorithm proposals by Lloyd andGulak, as modified so as to provide a moderate clock rate and aminimization of integrated circuit area. Embodiments of the inventionalso preferably provide a scalable architecture that may be configuredprogrammatically to realize various numbers of states at run time.

[0054] Three essentially generic Viterbi decoder architectures werestudied in Lloyd. These were: (a) a purely serial processor, much like aDSP; (b) the serial processor with some additional numeric components;and (c) a parallel processor, with one processing element (PE) pertrellis state (shown in FIG. 3 of Lloyd.

[0055] Additional architectures are presented by Gulak. A mostinteresting one for extension to the teachings of this invention is thelinear array. This is a systolic array in which there is one PE forevery state, and where data copies are circulated between the PEs. Whendescribing the instant invention, use is made of the term “Record” forthe data copies that are circulated between the PEs (from the Pascalprogramming language). For a fully connected trellis, all of the PEs aresimultaneously busy operating on Records (i.e., on data copies).

[0056]FIG. 2A illustrates a parallel processing element (PE 12) havingan associated Local Memory (LM) 13, a Channel LUT 14 and an input Record18. FIG. 2B shows a linear array of the PEs 12 (12A-12D), assumed inthis case to have the LM 13 internalized. In sequence, the input sampley arrives, the Records 18A-18D are processed in parallel by the PEs12A-12D, the Records 18 shift forward (form left to right in thisembodiment) and are processed again, and after all PEs 12 have processedall Records 18, the soft values (s) are output. FIG. 2C is another viewof the equalizer 10 with data circulating between the linear array ofPEs 12 (N PEs, PE₀ to PE_(N−1), designated for convenience as 12A-12D),in a manner somewhat similar to that shown in FIG. 3 of Lloyd and inFIG. 4 of Gulak. However, this invention further provides the ChannelLUT 14 coupled to each PE 12 (Channel LUTs 14A-14D, respectively), andemploys a soft value substitution unit 16, coupled to the right-most PE12D of the linear array, in regards to the presently preferredembodiment of the channel equalizer 10. The circulated data (Records 18)include parameters: Cumulative, History, smallest_Cum, cum0 and cum1, asdiscussed below. Each PE 12A-12D is assumed in this embodiment toinclude the Local PE Memory (LM) 13A-13D, respectively. Common inputs tothe equalizer 10 include a synchronization signal (synch), a clocksignal (f_(logic)), and the input samples y of the signal received fromthe channel that is to be equalized.

[0057] The basic steps in the operation of the equalizer 10 are asfollows:

[0058] Obtain the channel estimate h and calculate the Channel LUT 14;

[0059] Initialize the records (R) and PEs 12;

[0060] For each received data sample:

[0061] normalize the cumulative metric in PE Local Memory (LM) 13;

[0062] Repeat N times

[0063] Each PE accepts the record at its input

[0064] The record is processed, PE LM 13 is updated and some recordfields may be overwritten.

[0065] shift the record from the PE input to the PE output

[0066] compute the soft output values

[0067] wait for the next burst

[0068] The Channel LUT 14 stores all possible products between thechannel taps and constellation points x. For example, if the totalestimated channel length, L_(h),=6 and M=8, then there are 48 suchcomplex products stored in the Channel LUT 14. The element responsiblefor loading of the Channel LUT 14 is shown in FIG. 3A as the unit 15.

[0069] The synch signal is used to initialize and activate the PEs 12for each record. Also the synch signal controls the computation of thesoft values from the record R_(N−1) after processing of the sample iscompleted. The f_(logic) clock represents the highest clock frequencyrequired for operation of the PEs 12 so as to complete theircomputations within the allocated interval of time. As will be shownbelow, a suitable clock frequency maybe about 30 MHz.

[0070]FIGS. 11A, 11B, 11C and 11D are also useful in explaining theoperation of the equalizer 10. FIG. 11A shows the overall algorithmflow, the transition k from state i to state j given an input sample yand impulse response h for a constellation point x_(k) associated withtransition k, as well as a definition of the notation. FIG. 11B shows aset of presently preferred equalizer equations. FIG. 11C shows a fourstate trellis and the maximization of the cumulative metric assumingconstellation groups g₀ through g₃, as in FIG. 1. FIG. 11D shows theupdate procedure for the PE 12 Local Memory 13 and Channel LUT 14, basedon the input Record 18.

[0071] The internal components of a PE 12 are shown in FIG. 3A. The PE12 includes, in addition to the aforementioned LM 13, a Local FiniteState Machine (FSM) 20, a local compares block 22, a global comparesblock 24, a magnitude squares block 26, a switch (SW) 28, and threesummation nodes 30, 32 and 34. The following functions are carried outinside the PE 12 under the control of the local FSM 20. Before theprocessing of a sample begins, the cumulative metric (cum) value storedin the LM 13 is normalized. Also, the record R, designated Input Record18, is updated based on the data in LM 13. Then repetitive processing ofrecords begins as was explained above with regard to FIG. 2. Theprocessing of the Input Record 18, such as one obtained from theadjacent PE 12 of FIG. 2, is carried out by an examination of eachpossible trellis transition from the state represented by R to the staterepresented by the PE 12. The path history associated with the trellistransition under consideration is used to address the Channel LUT 14.Output values from the Channel LUT 14 are successively summed in node 34until the reference value, r, has been formed. At this time SW 28 isclosed, and r is subtracted from the sample, y, in node 32, to obtainthe difference d. The magnitude square of d is taken in block 26 toobtain the branch metric, b. The branch metric b is added in node 30 tothe R_cum value from the record, R, to obtain a new cumulative metric(cum) for consideration. The new cum is passed to the Local Comparesblock 22 which may make changes to both the record R 18 and the LM 13.At the same time the Global Compares block 24 operates and may makedifferent changes to R 18 and the LM 13.

[0072] The operation of the PE 12 may be summarized in a general senseas follows:

[0073] For each Record 18 arriving at the PE 12 input:

[0074] For each parallel transition of the trellis (illustrated in FIG.11C), the PE 12:

[0075] (a) calculates the reference, r, where in the presentlypreferred, but non-limiting embodiment the Channel LUT 14 is used forthis purpose;

[0076] (b) computes a difference d=y−r;

[0077] (c) squares the difference value, b=|d|²;

[0078] (d) adds this branch metric to the cumulative metric of theRecord 18; and

[0079] (e) examines the path history in the Record 18 and updates theLocal

[0080] Memory 13 path history and the Local Memory 13 metrics asrequired;

[0081] carries out the global compares and updates the Record 18 andLocal Memory 13 as needed.

[0082] The Local Compares block 22 is shown in FIGS. 4A and 4B. Thefunction of the Local Compares block 22 is to pose the followinginquiries, and to overwrite data as needed according to the results:

[0083] Is R_cum (Record Cumulative) greater or less than LM.13 cum(cum>?LM.cum)

[0084] cum<? LM.new_smallest_cum.

[0085] cum>? LM.new_cum0 bit 0, 1, or 2

[0086] cum>? LM.new_cum1 bit 0, 1, or 2.

[0087] If any of these tests is affirmative, as indicated by comparator23, shown as comparators 23A-23E in FIG. 4B, then the previous value inthe LM 13 is overwritten under control of the overwrite control block22A. The cum0 and cum1 tests are contingent on a value existing there.Note in FIG. 4B that a decode unit 22B can be used to decode the historyand existences outputs of the LM 13, and the decoded output is used as aselection input to three multiplexors (MUXes) 22C-22E thereby enabling,for this non-limiting embodiment, three bit-specific tests viacomparators 23C, 23D and 23E.

[0088] The global compares block 24 is shown in FIGS. 5A and 5B. Thepurpose of the global compares block 24 is to search through the PEs 12and determine with comparators 25A-25D, shown collectively as comparator25 in FIG. 5A, several globally large or small values. The smallestmetric is located for normalization purposes, while the best cum0 andcum1 metrics are located to compute the soft values (s). This occurs incooperation with a decode block 24B, driven by LM 13 and R 18 existencesvalues, and MUXes 24C and 24D. After the processing of a sample, theLocal FSM 20 directs the placement of the LM.new_smallest_cum value intothe LM.old_smallest_cum. During the operation of the Global Comparesblock 24 this value moves into the Record 18 being processed if it isthe smallest in value. As each PE 12 processes successive Records 18,the LM_smallest_cum value takes on the minimum value over all PEs 12. Itis only necessary to activate the Global Compares cum0/cum1 operationsfor R_(N−1), which visits each PE 12 once.

[0089] Further with regard to the operation of the Global Compares block24, after each sample period the extreme metric values need to be foundfor the purposes of normalization and computing the soft values. Thesevalues should be the extremes as searched over all of the PEs 12. Thevalues which are to be searched are not, however, set until all of therecords have been processed. A direct solution would create acentralized global searcher which could access all of the PEs 12.However, this approach would require a star shaped bus or a shared busto be incorporated into the systolic array design, and would greatlydiminish the benefits of the algorithm realization on the presentlypreferred locally-connected structure. It is thus preferred to obtainthe desired values after a one sample delay using a serial search. Theone sample delay has no impact on performance, as the Records 18 arealready circulating in a serial fashion in order to compute the metricsand path histories. The search is accomplished by adding the cum_oldtype variables to the Records 18 and Local Memories 13, and using theGlobal Compare 24 function. N values are searched to achieve thefunction. This can be achieved by only activating the Global Compare 24function in the rightmost PE 12 (all N Records 18 pass through every PE12 in the preferred embodiment). As such, it can be appreciated that onemay save power and gates by providing the Global Compares 24 block inonly the right-most PE 12. Alternatively, if provided in each PE 12 theGlobal Compares 24 could be powered down if not in use, or until use isrequired.

[0090] In certain embodiments, for example in FIG. 9B, a finalcomparison would be done in block 16 of FIG. 9B for the soft values. Thenormalization may be done locally in each sub-array (e.g., PE00,PE01,PE02,PE03), or in block 16 and communicated to the PE subarrays.Normalization per se in a Viterbi-type algorithm is a well-known step toreduce the number of bits needed to represent the cumulative metrics.

[0091] The foregoing discussion concerns both two state and four stateequalizers 10. For a 16 state equalizer 10, the trellis is not fullyconnected as shown in FIG. 9A. This means that the components of thesystolic equalizer 10 are not all busy if every state record circulatesthroughout the linear array. However, symmetry conditions exist in the16 state trellis such that it can be realized using four banks of fourelement PEs 12. The following discussion pertains to the specialrequirements for the 16 state systolic equalizer 10, followed by adescription of how to map many PEs 12 to a single PE using a higherclock rate. FIG. 9B is a block diagram of a systolic implementation ofthe 16 state equalizer 10, where there are four instances of the four PElinear array shown in FIGS. 2B and 2C, each operating in parallel toprovide soft values (s) from the soft value calculation block 16. TheChannel LUTs 14 are not shown in order to simplify the drawing.

[0092] Examination of the trellis for the 16 state equalizer 10 of FIG.9A shows that it may also be implemented with a linear array as shown inFIG. 6. The 16 state linear array is composed of four linear PE arrays,as shown in FIG. 9B, where each linear array forms one row of the tableshown in FIG. 6. During processing of a sample y, a Record 18 exitingfrom the right-most PE 12 re-enters at the left-most PE 12 in the samerow. In this manner the 16 state equalizer 10 uses four times the PEs 12and four times the Records of the four state equalizer shown in FIGS. 2Band 2C. After the sample y is processed, the soft values at therightmost PE's are gathered by the comparison unit 16 for reduction to asingle soft value per bit.

[0093] After the sample y has been processed, each PE 12 updates theRecord memory adjacent to it (as before, except that the PE 12 labelgenerally does not match the Record label). The Records 18 are thencirculated through the entire array so that they are available to a PEwhere they are needed. Also, an additional element of delay in findingthe smallest cum is preferably used.

[0094] The four PEs 12A-12D of FIGS. 2B and 2C may be mapped to a singlePE operating at a higher clock rate by adding memory (see Pirsch in thisregard). If one assumes that the LM 13 is removed from each PE 12, andthat delays are added between PEs 12, one obtains the embodiment of FIG.7, where the circles between PEs 12 represent delay elements.

[0095] In this case it is desired that PE0 processes R0, PE1 processesR1, PE2 processes R2 and PE3 processes R3, followed by shifting therecords which results in PE0 processing R3, PE1 processing R0, PE2processing R1 and PE3 processing R2, followed by shifting the recordsand PE0 processing R2, PE1 processing R3, PE2 processing R0 and PE3processing R1, followed by shifting the records and PE0 processing R1,PE1 processing R2, PE2 processing R3 and PE3 processing R0, and thenoutputting the delayed soft values from block 16. This can beaccomplished by shifting the LMs 13 (which actually define the identityof the PE 12) against the proper Records 18. The result is the equalizerembodiment shown in FIG. 8.

[0096] According to the schedule sequence discussed above, after R0-R3have been processed once, the Records 18 are shifted backwards againstthe direction of the arrow one space so that LM0 lines up with R3 at thebeginning of the processing of the next four record times. Since one PE12 in the four PE embodiment of the four state equalizer 10 processedeach record four times, when using one PE 12 as in FIG. 8 there are atotal of 16 record processing events.

[0097] A presently preferred algorithm for use in the systolic equalizer10 architecture is now described.

[0098] The following description includes variable declarations andpseudocode for the burst by burst operation of the systolic equalizer10. All of the operations shown in the pseudocode are implemented by thehardware elements discussed above.

[0099] Definitions:

[0100] Constants N Number of 2, 4, or 16 States NP Number of 2, 4 or 16(or 1) PEs NT Number of 4, 2 or 2 Member Symbols in Partition LH Lengthof 6 History L Number of 2, 2, or 3 MLSE taps M Modulation 2 or 8 LevelNs Number of 3 + 58 + 26 + 58 + 3 = 148 samples processed in a burstNsmall Init. value −1000 or the like Nbig Init. value 1000 or the like

[0101] Variable Names Name Meaning R Record Unit 18 LM Local Memory 13 hchannel estimate x constellation points C_LUT Channel Lookup Table 14

[0102] Operation Names Mc Complex Multiply Mr Real Multiply Ac ComplexAddition Ar Real Addition T Transfer a value to a register

[0103] Record Unit 18 Structure

[0104] Cum 16 bits

[0105] History 15 bits

[0106] old_cum0 (3 values)

[0107] old_cum1 (3 values)

[0108] smallest_cum 16 bits

[0109] cum0_exists (3 bits)

[0110] cum1_exists (3 bits)

[0111] Each record 18 has a need for nine, 16 bit words and two, 3 bitwords. There is one record per state.

[0112] Local Memory 13 Structure

[0113] Cum 16 bits

[0114] History 15 bits

[0115] new_smallest_cum 16 bits

[0116] old_smallest_cum 16 bits

[0117] new_cum0 (3 values)

[0118] new_cum1 (3 values)

[0119] old_cum0 (3 values)

[0120] old_cum1 (3 values)

[0121] cum0_existences (3 bits)

[0122] cum1_existences (3 bits)

[0123] Each LM 13 employs 24, 16 bit words and six, 3 bit words. Thereis one LM 13 per state.

[0124] Memory

[0125] Altogether, this embodiment of the systolic equalizer 10 uses 33,16 bit words and eight, 3 bit words per state. N States 16 bit words 3bit words 2 66 16 4 132 32 16 528 128

[0126] // Partition holds the complex symbol values for this state.Selected from J1 partitions of 8-PSK For Each Burst:   // initializeregisters in Records (R) and Local Memories (LM)   for (i=0;i<N;i++) {    R(i).Cum = Nsmall;     R(i).History = 0;     R(i).old_cum0 = Nsmall;    R(i).old_cum1 = Nsmall;     R(i).smallest_cum = Nbig;    R(i).cum0_exists = (0,0,0);     R(i).cum1_exists = (0,0,0);//opcount: 7 N T   }   for (i=0;i<N;i++) {     LM(i).Cum = Nsmall;    LM(i).History = 0;     LM(i).new_smallest_cum = Nbig;    LM(i).old_smallest_cum = 0;     LM(i).new_cum0 = Nsmall;    LM(i).new_cum1 = Nsmall;     LM(i).old_cum0 = Nsmall;    LM(i).old_cum1 = Nsmall; // opcount: 8 N T   }   // generate theChannel_LUT 14 using unit 15   for (i=0;i<M;i++) {     for(j=0;j<LH;j++) {       C_LUT(i,j) = h(j)*x(i); // opcount: M Lh Mc       }   }   // for each symbol   for (is=0;is<Nsym;is1++) { // it isimplied that all op counts in this loop are * Ns     // normalizecumulative metrics     for (i=0;i<N;i++) {       LM(i).cum −=LM(i).old_smallest_cum; // N Ar     }     // compute soft     //substitute values; may use the method explained by A. Viterbi, “CDMA    Principles ofSpread Spectrum Communication”,Addison-Wesley, 1995,    eqtn 4.51, or any other suitable method, including DFsoft     //r_dfe is a reference signal based on the jth constellation symbol attime     is-(L−1)     // d is the square of the distance from thereceived sample to r_dfe     for (j=0;j<M;j++) {       d(j) =|y(is-(L−1))-r_dfe(is-(L−1),j)|²;        }        // best0 is the valueof d with associated with a symbol with a 0       in position k       // best1 is the value of d with associated with a symbol with a 1      in position k        for (k=0;k<Log2M;k++) {         best0(k) =min0(d,k);         best1(k) = min1(d,k);         sub_soft(k) =best1(j)−best0(j);     }     for (i=0;i<Log2M;i++) {      if(R(N−1).old_cum0_exists(i) && R(N−1).old_cum1_exists(i)) {        soft(i) = R(N).old_cum0(i) − R(N).old_cum1(i);       }      else {         soft(i) = sub_soft(i);       }     }     //transfer some Local Memory 13 elements into Record 18 and initialize    existence variables     for (i=0;i<N;i++) {       for(ib=0;ib<Log2M;ib++) {         LM(i).old_cum0(ib) = LM(i).new_cum0(ib);        LM(i).old_cum1(ib) = LM(i).new_cum1(ib);       } // 6 N T      R(i).cum = LM(i).cum;       R(i).history = LM(i).history;      R(i).old_smallest_cum = Nbig;       LM(i).old_smallest_cum =LM(i).new_smallest_cum;       LM(i).new_cum0_exists = (0,0,0);      LM(i).new_cum1_exists = (0,0,0);       LM(i).cum = Nsmall;      LM(i).new_smallest_cum = Nbig; // 8 N T     }     for(ir=0;ir<N;ir++) { // for each record       // below is in parallel ifNP = N; operate each PE       for (ip=0;ip<N;ip++) {         for(it=0;it<NT;it++) { // for each parallel transition       //N*N*Nt*T  rM = C_LUT(0,LM(ip).Partition(it));           rD = 0+j0;          for (ic=0;ic<LH−1;ic++) {      //N*N*Nt*(Lh−1)*T  product=C_LUT(ic+1,R.History(ic+1));      //N*N*Nt*(Lh−2)*Ac  rD += product;           }       // N*N*Nt * 1Ar r = rD + rM;       // N*N*Nt* 1 Ar d = y(is) − r;       // N*N*Nt*(2Mr+1Ar) b = |d|²;       // N*N*Nt* 1 Ar c = R(ir) + b;       //N*N*Nt* 1 T cum = trunc(c);       // N*N*Nt* 1 Ar if (cum > LM(ip).cum){       //N*N*Nt* 2 T LM(ip).cum = cum :         LM(ip).History =R(ir).History;   }         if (cum < LM(ip).new_smallest_cum) {      LM(ip).new_smallest_cum = cum;   }     if(R(ir).smallest_cum<LM(ip).old_smallest_cum) { //N*N*Nt* 1 Ar  LM(ip).old_smallest_cum = R(ir).smallest_cum; } else {  R(ir).smallest_cum = LM(ip).old_smallest_cum; } for(ib=0;ib<Log2M;ib++) {    bit = R(ir).History >> bit_delay & 1;    if(!bit) {     if (cum>LM(ip).new_cum0(ib)) { //3* N*N*Nt* 1 Ar (Log2M=3for 8- PSK)       LM(ip).new_cum0 = cum;       new_cum0_exists(ib) = 1;// 3 * 2 T     }    }    else {       if (cum>LM(ip).new_cum1(ib)) {//3*N*N*Nt* 1 Ar       LM(ip).new_cum1 = cum;       new_cum1_exists(ib) = 1;    }    }    if (!R(ir).old_cum0_exists(ib) &&LM(ip).old_cum0_exists(ib))) {     R(ir).old_cum0 = LM(ip).old_cum0(ib);    R(ir).old_cum0_exists(ib) = 1; // this will not happen often    }   if (R(ir).old_cum0(ib)<LM(ip).old_cum0(ib) &&R(ir).old_cum0_exists(ib) && LM(ip).old_cum0_exists(ib) ) {// N*N*Nt* 1Ar     R(ir).old_cum0(ib) = LM(ip).old_cum0(ib); // 3 * T    }    if(!R(ir).old_cum1_exists(ib) && LM(ip).old_cum1_exists(ib))) {    R(ir).old_cum1 = LM(ip).old_cum1(ib);     R(ir).old_cum1_exists(ib)= 1; // same for cum1, won't happen often    }    if(R(ir).old_cum1(ib)<LM(ip).old_cum1(ib) && R(ir).old_cum1_exists(ib) &&LM(ip).old_cum1_exists(ib) ) {// N*N*Nt* 1 Ar     R(ir).old_cum1(ib) =LM(ip).old_cum1(ib); // 3 * T    } }         }       }     }   }

[0127]FIG. 12 is a logic flow diagram in accordance with a method ofthis invention, as described above in reference to the pseudocodeexample. At Block A a new burst is received from the channel, and atBlock C the Channel LUT 14 is initialized with the product of thechannel tap estimates and the constellation points. At Block C the LMs13 and Records 18 are initialized for each state. Block D represents aan outer loop of Ns times for each received sample y from the prefilter4. Block E represents a first inner loop of N, where N is the number ofstates. At Block F certain LM 13 information is transferred to theRecord 18. Block. Block F represents a second inner loop of Nt, thenumber of parallel transitions. At Block H the method uses the RecordHistory, the current state and transition value to sum the reference r,using the output from the Channel LUT 14 (summation element 34 in FIG.3A). At Block I the difference d is computed as y-d (difference element32 in FIG. 3A), and the branch metric b is then computed in the absolutemagnitude squared element 26 of FIG. 3A. In Block J the cumulativemetric is computed by adding in summation element 30 of FIG. 3A theoriginating state cumulative metric from the Record 18 and the branchmetric b. At Block K the Local Compares block 22 of FIG. 3A decodes(decoder 22B) the History and soft bit existences from the Local Memory13, and at Block L the cumulative metric comparisons are performed(using comparators 23A-23E of FIG. 4B). At Block M the methodoverwrites, as necessary, the cumulative metric and soft bit values, andat Block N the soft bit values are computed, substituting is necessary.At Block 0 the loop is re-entered at the appropriate point, depending onthe state of the outer and two inner loop counters.

[0128] As the preferred embodiment of the invention is implemented in anASIC or some other type of integrated circuit, a consideration is nowmade of the gate count and timing. The gate count follows from thehardware architecture. The timing analysis may use both the hardware andthe pseudocode to evaluate the critical path for determination of theminimum clock rate required.

[0129] The gate counts for the basic components needed are provided forreference purposes only, and are exemplary of one suitable embodiment.Mult- n_input bit Adder-real Multiplier-real Adder-Complex Complex 10260 870 520 4000 11 286 1045 572 4752 12 312 1236 624 5568 13 338 1443676 6448 14 364 1666 728 7392 15 390 1905 780 8400 16 416 2160 832 947217 442 2431 884 10608 18 468 2718 936 11808 19 494 3021 988 13072 20 5203340 1040 14400

[0130] Each device is assumed to have a latched input and output. A realadder requires 26n gates and a real multiplier requires 8n²+7n gates,where n is the input word length.

[0131] Using the gate counts one may estimate the number of gates for asingle PE 12, excluding control logic.

[0132] r—1 complex adder.

[0133] d=y−r; 1 complex adder

[0134] b=|d|²; 2 real multipliers and 1 real adder;

[0135] c=c+b; 1 real adder

[0136] local compares; 1 real adder

[0137] global compares; 1 real adder;

[0138] memory

[0139] 1 record and 1 LM per PE require a total of 25 16 bit words

[0140] 1 LUT per PE requires 96 16 bit words PE 12 Gate Count function(all are 16 bits) Number per PE gates complex adder 2 1664 realmultiplier 2 4320 real adder 4 1664 LM (16 bit words - latch) 9 576 R(16 bit words - latch) 16 1024 LUT (16 bit words - 96 2304 SRAM) Totalper PE with dedicated 11,452 approx. LUT 12,000

[0141] Latch memory is counted at 16 transistors per bit, 64 gates per16 bit word. SRAM memory is counted at 6 transistors per bit, 24equivalent gates per 16 bit word. The gate counts shown includeredundant registers, as the gate counts for each adder and multiplierinclude individual latches at the input and output.

[0142] Per equalizer, 1 complex multiplier, assume 10 bit inputs, isneeded for construction of the LUT 14. This adds 4000 gates to theoverall total. Assume C_LUT is calculated once per burst and then copiedin to each of N local lookup tables, one per PE 12. In this way, it isnot necessary to arbitrate access to the data therein. The number ofC_LUT registers could be reduced to 96 in all cases by introducingarbitration between the PEs 12 and the Channel_LUT table 14. Gate CountTotals 2 state, 2 state, 4 state, 4 state, 16 state, 16 state, numbergates number gates number gates 10 bit 1  4,000 1  4,000 1  4,000complex mult 2 24,000 4 48,000 16 192,000 number of PEs 2 4 16 number ofLUTs Total 28,000 52,000 196,000 gates

[0143] A purpose of the timing analysis is to discover the critical pathof the algorithm shown in the pseudo code when implemented on thesystolic equalizer 10. Reference can also be made to FIG. 3B, whichillustrates the PE 12 of FIG. 3A so as to emphasize the pipelinedportion of the architecture and the portion where the transitions areprocessed serially.

[0144] Once per burst:

[0145] 1. Initialize the 48 complex elements of Channel_LUT 14.

[0146] 2. Initialize the LM 13 and R 18 of each PE 12 for each receivesample {   Normalize LM.cum, compute soft substitution.   transfer LMinto R. - 9 words, e.g., nine clocks (ticks)   Repeat N times { //cycling of records     Repeat Nt times { // parallel transitions; assumepipelining of     this Nt loop       Repeat Nh times { // sum up thereference             pipelined { Read from C_LUT, Clock            Adder1 }         } // Nh; give this Nh+1 = 7 ticks              subtract from sample using Adder2;               1 tick              take magnitude square; 2 ticks               add R.cum; 1tick               cum has been produced for this              transition               5 similar compares: {                smallest_cum,cum,cum0/1(3)               }              potentially overwrite history; 2 ticks           } // Nt  } // N

[0147] compute soft outputs; clock ticks neglected Timing critical path2 state 4 state 16 state formation of r (pipelined) 7 ticks 7 ticks 7ticks d = r-y; (pipelined) 1 1 1 b = |d|^(2;) (pipelined) 2 2 2 3 localcompares cum0/1 6 6 6 min cum compare 2 2 2 max cum compare 2 2 2possible history overwrite 2 2 2 ticks per transition 10 + 12 Nt 10 + 12Nt 10 + 12 Nt transitions per record, Nt 4 2 2 records per PE 2 4 4extra ticks to position records 0 0 16 for next sample (at least) extraticks to load LM into R 9 9 9 before next sample ticks per sample 117145 161 samples per burst 116 116 116 burst period (microseconds) 577577 577 minimum required clock rate approx. approx approx (MHz) 24 MHz30 MHz 33 MHz.

[0148] It would be possible to schedule unrelated comparisons tooverlap. For instance, if the first transition has a cum0 value for bit0, and the second transition has a cum1 value for bit 0, then these canbe updated and compared to Local Memory 13 in parallel. An overwrite ofLocal Memory 13 by the first transition's value of cum0, bit0, will notcollide with a comparison of that register by the second transition,since it has no value there. Allowing for these kinds of scheduling canreduce the critical path

[0149] Based on the foregoing description those skilled in the artshould appreciate that the disclosed systolic equalizer 10 provides anumber of advantages over conventional equalizers. It is again notedthat while Lloyd discuss a locally connected array (FIG. 3), and statethat their application is applicable to both Viterbi decoding andViterbi equalization, they do not disclose the scalable channelequalizer 10 that is constructed and operated as a parallel, systolicarray of like processing elements 12 that exhibits, among otherfeatures, reduced state sequence estimation, decision feedback and theglobal search function (Global Compares block 24) for metricnormalization and soft value determination. In the presently preferredembodiment of the equalizer 10 the sorting of soft values can beaccomplished by multiple passes through the systolic array.

[0150] In addition, a 16 state equalizer can be realized as four, fourPE 12 linear arrays, and by shifting the Records 18 as described above.This invention also provides for cycling the LM 13 against the Records18, interleaved with shifts of the Records 18 every four clocks toprovide a four state equalizer using but one PE 12. Thus, it should beappreciated that this invention provides an equalizer 10 that contains alogical arrangement of a plurality of instantiations of the locallycoupled, possibly identical processing elements 12 that form thesystolic array. The arrangement maybe viewed as being logical in thesense that, as examples, a 16 state equalizer can be realized with butfour physical PEs 12, and a four state equalizer can be realized withbut one PE 12.

[0151] The scalability made possible by the teachings of this inventionalso enables using one four PE systolic array to realize a two stateequalizer, such as by powering off or otherwise disabling two of thePEs; a four state equalizer (as shown in FIGS. 2B and 2C); or a 16 stateequalizer by cycling the LMs 13 and the Records 18, as shown in FIG. 7.

[0152] This invention also provides for using combinatorial logic, asshown in FIG. 4B, for decoding path histories and activating cumulativemetric comparators for sorting cumulative metrics so as to ultimatelyobtain the desired soft bits.

[0153] Although described in the context of particular embodiments ofthe scalable systolic architecture, it should be apparent to thoseskilled in the art that a number of modifications and various changes tothese teachings may occur. Thus, while the invention has beenparticularly shown and described with respect to one or more preferredembodiments thereof, it will be understood by those skilled in the artthat certain modifications or changes, in form and shape, may be madetherein without departing from the scope of the invention as set forthabove.

1. An equalizer, comprising a logical arrangement of a plurality ofinstantiations of locally coupled processing elements PEs forming asystolic array for processing in common received signal samples havingdistortion induced by passage through a communications channel, andoutputting soft values for input to a decoder, the soft valuesrepresenting an approximation of maximum a posteriori (MAP)probabilities, the equalizer further comprising a global compares blockfor comparing a data entry in records operated on by said plurality ofPEs and selectively overwriting said data entry in one of said recordsbased on the comparing.
 2. An equalizer as in claim 1, where a trellissearch procedure is employed to reconstruct estimates of a receivedsignal sequence based on a reduced number of states represented by aplurality of groups determined by partitioning a symbol constellationsuch that there are fewer groups than possible symbols.
 3. An equalizeras in claim 2, where the signal constellation represents one formed by8-PSK modulation of a transmitted signal sequence.
 4. An equalizer as inclaim 1, where an effect of a prior symbol is subtracted using adecision feedback mechanism.
 5. An equalizer as in claim 1, where saidlocally coupled processing elements operate in parallel and eachcomprise an input node for receiving a Record to be processed; a nodefor coupling to a channel look-up table (Channel LUT) addressed by theRecord and storing products of individual channel taps with individualconstellation points of a symbol constellation; a local memory (LM); andcircuitry for calculating a reference, r, using data stored in theChannel LUT; circuitry for computing a difference d=y−r; circuitry forsquaring the difference value, b=|d|² form a branch metric; circuitryfor adding the branch metric to a cumulative metric of the Record andcircuitry for examining a path history in the Record and updating the LMpath history and the LM metrics as needed, and where at least oneprocessing element further comprises circuitry, operating after allRecords are processed, for performing a global metric comparison andupdating Records and LMs as needed.
 6. An equalizer as in claim 1, wheresaid locally coupled processing elements are coupled together as alinear systolic array of processing elements.
 7. An equalizer as inclaim 1, where said logical arrangement of the plurality ofinstantiations of locally coupled processing elements is embodied in oneprocessing element, where for N states the one processing elementsuccessively processes N instantiations of a local memory against aninput Record, each of which stores data that comprises cumulativemetrics.
 8. An equalizer as in claim 1, where for N states there are Minstantiations of said locally coupled identical processing elements,where M<N, and where delays are inserted between serially coupledprocessing elements.
 9. An equalizer as in claim 1, comprisingcombinatorial logic for decoding path histories and activatingcumulative metric comparators for sorting cumulative metrics so as toobtain the soft bits.
 10. An equalizer as in claim 1, wherein saidequalizer is embodied within an integrated circuit.
 11. A method forprocessing, on a burst by burst basis, received signal samples havingdistortion induced by passage through a communications channel,comprising: providing a reduced state equalizer comprised of a logicalarrangement of a plurality of instantiations of locally coupledprocessing elements (PEs) forming a systolic array, each PE having anassociated Local Memory (LM) and Channel Look-Up Table (LUT); obtainingan estimate of the channel; initializing Records (R) and PEs andcalculating the contents of the Channel LUT; for each received signalsample y, normalizing a cumulative metric in each PE LM and repeating Ntimes, accepting a Record at each PE input; processing the Record andupdating each PE LM; and shifting the Record from the PE input to the PEoutput; and outputting soft values for input to a decoder, the softvalues representing an approximation of maximum a posteriori (MAP)probabilities.
 12. A method as in claim 11, where each Record comprisesa cumulative metric, a trellis path history, old cumulative metrics, asmallest cumulative metric, and existing cumulative metrics, and whereeach LM comprises the cumulative metric, the path history, new and oldsmallest cumulative metrics, new and old cumulative metrics, andexisting cumulative metrics.
 13. A method as in claim 11, whereprocessing the Record comprises computing a branch metric, combining thebranch metric with an originating state cumulative metric from theRecord to form a new cumulative metric, and performing cumulative metriccompares for potentially modifying values stored at least in the LM. 14.A method as in claim 11, where said logical arrangement of the pluralityof instantiations of locally coupled processing elements is embodied inone processing element, where said one processing element successivelyprocesses N instantiations of the LM against an input Record, where N isnumber of states.
 15. A method as in claim 11, where there are Minstantiations of said locally coupled processing elements, where M<N,where N is number of states, and where delays are inserted betweenserially coupled processing elements.
 16. A method as in claim 11, whereprocessing the Record comprises decoding a path history and cumulativemetric existences and activating cumulative metric comparators forsorting cumulative metrics so as to obtain the soft output values.
 17. Amethod as in claim 11, where processing the Record comprises each PEoperating in parallel to: calculate a reference, r, using data stored inthe Channel LUT; compute a difference d=y−r; square the differencevalue, b=|d|² form a branch metric; add the branch metric to acumulative metric of the Record; and examine a path history in theRecord and updates the Local Memory path history and the Local Memorymetrics as needed.
 18. A method as in claim 17, further comprising,after all Records are processed, performing a global metric comparisonand updating Records and Local Memories as needed.
 19. A method as inclaim 11, where said equalizer performs Viterbi equalization of an8-PSK, EDGE (Enhanced Data rate for Global Evolution) signal receivedthough an RF communications channel.
 20. A method as in claim 11, wherea trellis upon which the processing elements operate comprises one of afully connected trellis or a partially connected trellis.
 21. Theequalizer of claim 1, wherein said plurality of PEs are coupled onlylocally.
 22. The equalizer of claim 1 wherein said global compares blockis disposed within one of said PEs.
 23. The equalizer of claim 1 whereinsaid global compares block operates to selectively overwrite said dataentry when said comparing results in a more extreme metric than saiddata entry.
 24. An equalizer, comprising a logical arrangement of aplurality of instantiations of locally coupled processing elements PEsforming a systolic array for processing in common received signalsamples having distortion induced by passage through a communicationschannel, and outputting soft values for input to a decoder, the softvalues representing an approximation of maximum a posteriori (MAP)probabilities, wherein each PE is coupled to a channel lookup table LUTof possible products of channel taps and constellation points.
 25. Theequalizer of claim 24 wherein a separate LUT is associated with each PE.26. The equalizer of claim 24 wherein the LUT is loaded at a start ofeach time slot over which the PEs operate on a set of inputs.
 27. Theequalizer of claim 26 wherein the channel taps used to load the LUT arefixed at the start of each time slot and said LUT is not adaptive duringa time slot once loaded.